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Abstract 

Failure  detectors  (or,  more  accurately  Failure  Su  spec  tors  -  FS)  ap¬ 
pear  to  be  a  fundamental  service  upon  which  to  build  fault-tolerant , 
distributed  applications.  This  paper  shows  that  a  FS  with  very  weak 
semantics  (i.e.  that  delivers  failure  and  recovery  information  in  no 
specific  order)  suffices  to  implement  virtually-synchronous  communi- 
cation  (VSC)  in  an  asynchronous  system  subject  to  process  crash  fail¬ 
ures  and  network  partitions.  The  VSC  paradigm  is  particularly  useful 
in  asynchronous  systems  and  greatly  simplifies  building  fault-tolerant 
applications  that  mask  failures  by  replicating  processes.  We  suggest 
a  three-component  architecture  to  implement  virtually-synchronous 
communication  :  1)  at.  the  lowest  level,  the  FS  component;  on  top  of 
it,  2a)  a  component  that  defines  new  views,  and  2b)  a  component  that 
reliably  multicasts  messages  within  a  view.  The  issues  covered  in  this 
paper  also  lead  to  a  better  understanding  of  the  various  membership 
service  semantics  proposed  in  recent  literature. 


I: 

'The  first  author  is  on  leave  from  Ecole  Polytechnique  Federaie  de  Lausanne.  Switzer¬ 
land.  His  research  is  supported  bv  the  “Fonds  national  suisse”  under  contract  number 
21-32210.91,  as  part  of  the  European  ESPRIT  Basic  Research  Project  Number  6360 
(BROADCAST). The  second  is  supported  by  DARPA/NASA  Ames  Grant  NAG  2  593. 
and  by  grants  from  IBM  and  Siemens  Corporation. 

1 


J'i  nt. 


1  Introduction 


There  have  recently  been  several  papers  about  membership  services  in  asynchronous  sys¬ 
tems  [2,  12,  13,  17,  18,  19,  20].  A  membership  service  is  responsible  for  giving  each  process 
(consistent)  information  about  the  operational  processes  in  the  system.  A  process  calls  this 
information  its  view  of  the  system  processes.  A  membership  service  typically  reacts  to  process 
crashes  or  recoveries,  leading  it  to  define  a  set  of  views.  The  membership  services  mentioned 
vary  accordmg  to  the  underlying  failure  model  considered,  as  well  as  the  properties  they 
provide  with  respect  to  the  set  of  views  delivered  to  each  process:  (e.g.  whether  another 
view  may  exist  simultaneously,  the  degree  of  agreement  among  members): 

•  [17,  18]  consider  processes  with  crash  failure  semantics,  excluding  network  partitions. 

•  [19,  20]  consider  systems  in  which  processes  may  crash  and  the  network  may  partition. 
However,  despite  network  partitions,  this  membership  service  defines  only  majority 
views  -  a  unique,  totally-ordered  sequence  of  views.  Such  a  membership  service  is  said 
to  have  linear  semantics. 

•  The  membership  services  described  in  [1,  2,  13]  consider  the  same  failure  scenario  as 
above,  but  only  define  a  partial  order  on  the  views.  That  is,  if  the  system  is  partitioned 
in  two  (or  more)  subnetworks  then  two  (or  more)  views,  one  in  each  subnetwork,  may 
exist  concurrently. 

Concurrent  views  offer  an  interesting  extension  to  membership  services,  and  force  us  to 
consider  a  further  semantic  distinction  based  on  whether  concurrent  views  are  permitted  to 
intersect.  If  two  concurrent  views  may  overlap,  we  say  the  membership  service  semantics 
are  weak-partial,  if  they  may  not  we  say  the  semantics  are  strong-partial.  Among  those 
that  permit  concurrent  views,  [2]  appears  to  be  a  strong-partial  membership  service.  [13] 
considers  both  strong-partial  and  weak-partial  membership  services,  and  [1]  and  [12]  consider 
only  weak-partial  membership  service.  These  variants  raise  a  new,  pertinent  question:  when 
is  a  strong-partial  service  required,  and  when  does  a  weak-partial  membership  service  suffice. 
The  objective  of  this  paper  is  to  suggest  an  answer  to  this  question,  by  showing  that  a  strong- 
partial  membership  service  is  intimately  related  to  virtually-synchronous  communication.  We 
do  not  discuss  when  a  linear  membership  service  is  required. 

The  idea  of  virtually-synchronous  communication  (VSC)  was  first  introduced  bv  Tcis  [8,  4] 
VSC  can  be  understood  as  rule  for  ordering  message  deliveries  (reliable  multicasts)  with 
respect  to  view  changes  (received  from  the  membership  service).  We  give  a  precise  definition 
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for  VSC  in  Section  5.4.  VSC  defines  a  powerful  model  for  building  fault-tolerant  processes 
that  mask  failures  by  replication.  It  has  also  been  argued  [5]  that  ordering  message  deliveries 
consistently  around  process  failures  and  recoveries  is  a  fundamental  part  of  any  distributed 
computation;  thus  VSC  is  a  vital  primitive  for  inherently-distributed  programming.  ReJat- 
edly,  many  common  distributed  applications  are?  more  easily  understood  and  solved  if  they 
can  make  use  of  VSC  [21].  Finally,  if  the  VSC  abstraction  we  define  in  this  paper  is  aug¬ 
mented  with  a  majority  requirement,  [22]  shows  it  is  a  powerful  model  in  which  transaction 
commit  is  easily  (albeit  probabilistically)  implemented.  Understanding  that  the  VSC  ab¬ 
straction  is  more  basic  than  the  transaction  abstraction  gives  broader  insight  to  the  problem 
of  building  fault-tolerant  applications.  However,  we  note  that  solving  VSC  is  not  equivalent 
to  solving  consensus  [10]. 

Traditionally  virtually-synchronous  communication  has  been  implemented  with  a  two  com¬ 
ponent  architecture:  a  membership  service,  and  on  top  of  it,  multicast  component .  However, 
understanding  the  relationship  between  a  membership  service  and  virtually-synchronous 
communication  has  lead  us  to  consider  a  three-component  architecture,  with  (1)  a  Fail¬ 
ure  Suspector  component  FS  delivering  information  about  the  communication  topology.  (2) 
a  View  Component  VC  defining  views,  and  (3)  a  Multicast  Component  MC  implementing 
virtually-synchronous  communication.  We  divide  the  functionality  of  the  traditional  mem¬ 
bership  service  between  our  FS  and  VC  components. 

In  addition  to  increasing  our  understanding  of  the  relationship  between  any  membership 
service  and  virtually-synchronous  communication,  this  architecture  allowed  us  to  specify 
precisely  the  FS  semantics  needed  to  guarantee  VC  and  MC  liveness.  One  weakness  of 
previous  work  in  this  area  has  been  a  lack  of  precise  semantics  for  the  FS  part  of  the  system. 

Explicitly,  the  paper  shows: 

•  that  virtually-synchronous  communication  satisfying  the  definition  given  in  Section  5.4 
can  be  implemented  with  a  modular,  three-component  architecture  for  system  models 
with  both  process  crash  failures  and  network  partitions  (i.e.  link  failures).  We  start, 
with  a  very  simple  model,  and  from  it  construct  a  useful  communication  primitive  for 
fault-tolerant,  distributed  applications. 

•  how  to  define  concurrent  views  that  have  empty  intersections.  That  is,  how  to  imple¬ 
ment  strong- partial  membership  semantics  in  a  system  that  may  partition.  The  basic 
idea  is  to  define  a  view  as  a  set  of  pairs  (proc  id,  proc  sequence  number). 
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•  that  if  we  remove  the  MC  component  from  the  architecture  (e.g.  if  virtuallv-synchronous 
communication  is  not  needed),  then  the  view  component  defines  views  that  do  not 
satisfy  the  empty  intersection  condition  (i.e.  giving  a  membership  service  with  a  weak- 
partial  semantics). 

Section  2  describes  our  low-level  system  model  and  the  interaction  of  the  three  components. 
Section  3  gives  a  precise  semantics  for  the  failure  suspector.  Sections  4  and  5  sketch  how  to 
implement  the  VCP  and  MCP  components,  and  Section  6  completes  the  vc„  and  MCP  protocols. 
We  conclude  in  Section  7. 


2  System  Model 

Our  low-level  system  model  consists  of  an  infinite  name  space  of  process  identifiers.  Proc  = 
{pi,p2, ...,  }.  The  name  space  is  infinite  to  model  infinite  executions  in  which  processes 
continually  fail  and  recover.  At  any  point  in  time,  however,  there  are  only  a  finite  number  of 
executing  processes  under  consideration  and  we  restrict  our  attention  to  these.  For  this  finite 
set  of  executing  processes,  we  assume  a  completely-connected  network  of  FIFO  channels. 
Processes  communicate  by  passing  messages  over  these  channels,  though  they  too  may  fail. 
The  system  has  no  global  clock,  and  message  transmission  delays  are  unbounded.  Processes 
fail  by  crashing,  which  we  model  by  the  local  event  crashp.  We  model  the  recovery  of  a 
process  with  a  new  identifier.  A  process  p  may  (1)  send  a  message  to  another  process,  (2) 
deliver  a  message  sent  by  another  process  q,  and  (3)  perform  local  computation. 

A  history,  hp ,  for  process  p  is  a  sequence  of  events  beginning  with  the  event  startp  and 
terminating,  if  at  all,  with  the  event  crashp:  hp  =  startp  •  e*  •  •  •  e£,  for  0  <  k.  A  cut  is  an 
n-tuple  of  process  histories,  one  for  each  p  G  Proc.  We  assume  familiarity  with  inter-event 
causality  [15]  and  with  consistent  cuts  [8]. 

Crash  failures  are  surprisingly  difficult  to  handle  in  an  asynchronous  system.  Fischer, 
et.al  [10]  show  that,  because  it  is  impossible  to  distinguish  a  crashed  process  from  one 
that  is  just  very  slow,  any  problem  requiring  “all  correct  processes”  to  agree  on  some  value 
cannot  be  solved  deterministically;  that  is,  no  deterministic  protocol  can  make  progress  if  it 
must  also  make  accurate  process  failure  detections.  One  way  around  this  is  for  asynchronous 
systems  to  incorporate  some  mechanism  for  suspecting  failures,  as  well  as  a  means  of  han¬ 
dling  failure  suspicions  consistently  (e.g.  p  may  suspect  q  faulty  while  r  may  not;  perhaps 
r  and/or  q  even  suspect  p).  Our  system  model  assumes  a  failure  suspector  that  eventually 
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Figure  1:  FSP,  MCP,  and  VCp  interaction  for  virtually-synchronous  communication. 

suspects  a  crashed  process,1  which  suffices  to  ensure  our  protocols  make  progress.  We  do 
not  require  anything  more  of  the  failure  suspector. 

Each  process  has  three  components  that  interact  to  implement  the  virtually-synchronous 
communication  primitive  for  application-layer  processes  (Figure  1).  The  Failure  Suspector 
(FSP)  is  at  the  lowest  level  and  notifies  both  the  Multicast  Component  (MCp),  and  the  View 
Component  (VCP)  about  suspected  changes  in  the  communication  topology.  Such  changes 
arise  from  actual  process  and  link  failures,  as  well  as  high  processor  loads  and  heavy  net¬ 
work  traffic  (indistinguishable  from  true  failures).  VCP  defines  p's  current  view ,  Viewp(),  an 
approximation  of  the  set  of  processes  with  which  p  can  communicate,  and  sends  Viewp()  to 
MCP.  MCP  is  responsible  for  reliably  multicasting  application- layer  messages  until  it  receives 
an  accessibility-change  notification  from  FSP.  These  notifications  signal  a  suspected  change 
in  the  communication  topology  and  the  attendant  need  to  alter  Viewp().  However,  neither 
MCP  nor  VCP  can  do  this  naively  since  virtually-synchronous  communication  requires  that 
members  of  Viewp()  that  also  accompany  p  to  its  next  view  receive  the  same  set  of  messages 
that  were  multicast  within  Viewp()  (We  make  this  definition  precise  in  Section  5).  To  ensure 
this,  MCP  delivers  all  outstanding  multicasts,  and  does  not  issue  new  multicasts  except  to 
forward  those  that  have  been  only  partially  delivered.  Viewp()  is  safely  terminated  when  all 
messages  multicast  in  it  are  delivered  at  all  sites  that  MCP  believes  non-faulty.  When  MCP 
detects  this  condition  (Section  4)  it  informs  VCP,  which  then  determines  a  new  view  for  MCp 
from  the  accessibility  notifications  it  received  from  FSP. 

Section  3  describes  the  properties  our  Failure  Suspector  components  must  satisfy.  These  are 
weak  yet  reasonable  requirements,  and  are  easily  implemented  in  any  asynchronous  system. 
Section  4  discusses  VCP,  and  Section  5  discusses  MCP.  These  components  execute  protocols 


^his  can  easily  be  implemented  with  time-outs. 


to  detect  global  properties  [8,  16]. 


3  The  Failure  Suspector 

Given  process  p,  FSP  emits  a  sequence  of  not-comm((3>)  and  comm(r)  suspicion  messages  to  MCP 
and  VCp.  Since  the  system  is  asynchronous  we  cannot  guarantee  the  accuracy  or  timeliness  of 
these  suspicions;  the  most  we  can  require  is  that  FSP  eventually  suspects  true  crashes  and  re¬ 
coveries.  This  is  not  unreasonable.  It  is  known  that  fault-tolerant  protocols  in  asynchronous 
systems  cannot  make  progress  if  they  are  required  to  make  accurate  failure  determinations. 

Our  approach  introduces  an  inaccurate  failure  suspector  to  gain  liveness.  On  the  other  hand, 
we  cannot  require  FSP  to  suspect  all  periods  of  transient  inaccessibility  -  a  network  partition 
may  repair  before  it  is  noticed. 

Since,  in  theory,  FSP  may  suspect  processes  arbitrarily,  we  have  divorced  FSP  implementation 
from  the  problem  at  hand.  In  a  real  system,  FSP  might  take  cues  from  the  underlying 
communication  layer,  the  operating  system,  response  delays,  and  so  forth.2 

On  every  consistent  cut  c,  FSP  maintains  two  non-intersecting  sets,  CommSetp(c)  and  NotCommSet;,i  c). 
When  FSP  suspects  q  €  CommSetp(c),  q  is  removed  from  CommSetp(c)  and  is  thereafter  a 
member  of  NotCommSetp(c).  Whenever  these  sets  change,  FSP  notifies  vcp  and  MCP  by 
emitting  the  appropriate  comm()  or  not-comm()  messages. 

We  have  a  reciprocity  condition  for  (perceived)  partitions,  as  well.  To  model  the  nature  of 
network  partitions,  we  require  eventual  reciprocity  of  inaccessibility  suspicions.  That  is.  if 
FSp  suspects  q  then  eventually  either  FS9  suspects  p  or  q  fails. 

A  logical  formula  holds  on  a  consistent  cut.  The  membership  of  an  indexical  set  of  processes 
depends  on  when  it  is  considered.  In  our  model,  ‘when’  translates  to  consistent  cuts,  the  only 
physically-realizable  instances.  We  use  the  following  formulas  and  indexical  sets  to  specify 
the  behavior  of  FSP. 

•  NOTCOMMp(q)  holds  on  c  if  q  €  NotCommSetp(c) 

•  CoMMp(g)  holds  along  c  if  q  €  CommSetp(c) 

•  DOWN,  holds  along  c  =  (hi, . . . ,  hq, . . . ,  hn )  if  crashq  is  the  last  event  in  hq 

2For  example,  to  detect  failures  fsp  could  query  a  process,  deeming  it  inaccessible  if  it  does  not  repond 
in  a  timely  fashion  (inaccurate,  but  satisfying  the  requirement).  We  might  put  the  onus  on  a  process  to 
announce  its  recovery. 
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•  UP,  holds  along  c  —  (h\ , . . . ,  hq, . . . ,  hn)  if  crash,,  is  not  an  event  in  hq. 


Non-triviality  Conditions  for  FSP 

Crashes  If  q  crashes,  then  eventually  either  p  crashes  or  FSP  suspects  q  is  unreachable: 

DOWN,  =>  O^NotCoMMp(<7)  V  DOWNpj 

Recoveries  If  q  begins  executing  and  is  reachable,  then  eventually  either  p  crashes  or  Fsp 
suspects  q  is  reachable: 

UP,  =»  o(cOMHp(q)  V  DOWNpj 


Reciprocity  If  FSP  suspects  q  is  inaccessible,  then,  if  q  does  not  crash,  it.  eventually  suspects 
p  is  inaccessible: 

NotCommp(9)  =*•  O^DOWN,  V  NotComm,(p)J 

This  is  an  artifact  of  p  suspecting  q:  since  p  ceases  communicating  with  q,  p  is,  in  fact, 
inaccessible  to  q. 


Propagation  Conditions  for  FSP 

Finally,  we  require  failure  suspectors  to  gossip  among  themselves. 


Inaccessibility  Propagation  If  FSP  believes,  on  cut  c,  it  cannot  communicate  with  q  then 
it  tries  to  propagate  this  belief  to  every  FSr  for  r  E  CommSetp(c): 

NOTCOMMp(?)  =S>  O^NOTCOMMr(g)  V  NOTCOMMr(p)j) 

Accessibility  Propagation  If  FSP  believes,  along  c,  it  can  communicate  with  q  then  it 
tries  to  propagate  this  belief  to  every  FSr  for  r  E  CommSetp(c): 

COMMp(q)  =>  O^COMMr(q)  V  NOTCOMMr(p) 
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3.1  Related  work 


Before  discussing  the  other  components,  we  discuss  the  relation  between  this  and  other  work. 
In  [7],  Chandra  and  Toueg  solve  Distributed  Consensus  in  an  asynchronous  system  using  a 
Failure  Suspector,  U\  that  satisfies  certain  (weak)  requirements.  [6]  further  shows  that  1. 
is  the  weakest  suspector  that  can  be  used  to  solve  Distributed  Consensus.  While  we  do  not 
consider  consensus  in  this  paper  we  said  in  the  Introduction  that  adding  a  majority  require¬ 
ment  to  the  VSC  abstraction,  gives  a  simple,  probabilistic  solution  to  transaction  commit. 
Since  there  are  no  fundamental  differences  between  solving  consensus  and  atomic  commit 
problem,  how  are  both  approaches  related  (we  will  not.  hereafter,  distinguish  consensus  from 
atomic  commit)? 

First  it  should  be  clear  that  our  Failure  Suspector  is  not  weaker  than  W .  More  important. 
[7]  also  places  a  majority  requirement  on  processes  before  W  can  be  used  to  solve  consensus. 
To  relate  the  two  approaches,  consider  a  genera’ zation  of  consensus: 

•  suppose  consensus  is  to  be  solved  more  than  once,  and  let  consen^us(i),  for  i  >  0.  be 
the  zih  instance  of  the  consensus  problem; 

•  let  Proc  be  the  initial  set  of  processes  that  solve  consensual); 

•  consensus(i  -f-  1)  begins  only  after  conscnsus(i)  has  been  solved; 

•  for  consensus(i),  i  >  1,  the  processes  chose  their  initial  state  random'y  from  the  set 
{0,1}. 

In  [7],  consensus(i)  (for  each  f)  would  be  solved  by  the  same  static  set  of  processes  Proc.  rI  he 
majority  requirement  to  solve  consensus(i)  is  thus  similar  to  a  static  voting  scheme  in  the 
context  of  handling  replicated  data  [11],  This  is  because  [7]  consider  that  failure  suspicions 
are  never  stable:  a  process  p  believing  failed(q)  can  always  change  its  mind. 

In  contrast,  in  the  VSC  model,  failure  beliefs  are  stable  each  time  a  new  view  is  defined. 
Thus  for  i  ^  j,  consensus(i)  and  consensus{j)  need  not  be  solved  by  the  same  set  of 
processes.  Continuing  the  replicated  data  analogy,  the  majority  requirement  in  the  VSC 
model  is  similar  to  the  dynamic  voting  scheme  (9),  which  has  been  shown  to  lead  to  higher 
data  availability  than  the  static  voting  scheme. 
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4  The  View  Component 


The  view  component  operates  whenever  a  link  failure  repairs,  a  process  begins  executing, 
recovers  after  a  crash,  and  whenever  the  multicast  component  informs  it  that  the  current 
view  has  terminated  (Section  5).  VOp  defines  p's  current  view  by  interaction  with  other  vr 
components,  and  by  using  FSP  information. 

V'Cp  defines  a  new  view  when  it  detects  (or  learns  about  through  some  other  ve  component) 
agreement  on  CommSetp()  among  the  members  of  CommSetp().  The  new  view  will  be  the 
largest  subset  of  processes  (containing  p)  satisfying  this  agreement. 


4.1  The  View  Component  Algorithm 

In  this  section  we  outline  how  VCP  detects  or  learns  about  CommSetp()  agreement. 

When  vcp  is  activated,  it  knowfs  a  near  approximation  of  CommSetp()  from  FSP.3  Whenever 
VCp  receives  an  comm(r)  message  from  FSp,  it  updates  this  approximation.  Along  cut  c.  VCP 
uses  a  deterministic  function,  vc-Coord{p)A  on  the  set  CommSetp(c)  which  returns  a  unique 
process  identifier,  and  satisfies 

(CommSetp(c)  =  CommSet9(c)j  =>  (vc-Coord(p)  =  vc-Coord(q)'j . 

For  example,  vc-Coord(p)  might  be  “choose  the  ‘smallest5  identifier  from  CommSetp(c).“ 

Each  process  also  maintains  a  local  counter,  seqp,  which  is  incremented  every  time  VCp  con¬ 
siders  vc-Coord(p)  to  have  changed  (this  is  not  necessarily  every  time  CommSetp(c)  changes. 
For  liveness,  however,  vc-Coord(p)  must  change  when  VCP  receives  not-comm {vc-Coord(p)) 
from  FSP).  The  counter  seqp  is  initially  zero  and  is  essential  in  allowing  us  to  define  non- 
intersecting,  concurrent  views.  The  tuple  (p,seqp)  fully  describes  p  on  any  consistent  cut. 

Finally,  the  formula  CommSetEq(S)  holds  on  c  if  and  only  if  all  p  €  S  have  identical 
CommSet()  sets  at  c.  That  is, 

Co mmSetEq(S)  =f  f\  ^CommSetpO  =  CommSet9()^ 

p.ijeS 

3There  may  be  notifications  from  fsp  that  have  not  yet  readied  vcp. 

''Technically,  we  should  name  some  cut  explicitly  since  the  function’s  value  depends  p’s  indexical  can- 
communicate-with  set.  We  omit  the  cut  reference,  but  with  the  understanding  that  vc-Coord{p)  has  a 
temporal  dependence.  In  fact  p  never  knows  which  particular  cut  it  is  on,  but  at  any  point  in  its  execution 
VCp  has  some  set  of  process  identifiers  that  satisfy  a  certain  condition.  It  determines  a  coordinator  in- 
applying  some  rule  to  this  set.  The  presence  of  c  would  only  clarify  matters  for  the  omniscient  reasoner 
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In  our  protocol,  each  p  sends  its  current  CommSetp( )  and  current  sup  number  to  rc-(  'oovd(p) 
every  time  CommSetp()  changes. 


4.2  Defining  the  New  View 

Let  k  —  vc-Coord(p).  and  S  =  CommSet^(c)  for  some  cut  c.  'I'hen  vc„  receives  CommSeth  j 
for  p  €  S.  Whenever  it.  receives  a  different  CommSetP()  from  some  p.  vc*  discards  the 
previous  one  and  checks  whether  CommSet Eq{ CommSet«( ) )  holds.  If  it  does.  ve„  sets  the 
new  view.  View*(),  to 

Viewed  =  V  =  {(/>,  seqp)  j  p  t  CommSet,;(  )  j  (  1 ) 


The  coordinator  k  then  sends  the  new  view  to  each  VCP  (for  p  £  1)  which  then  delivers  the 
view  to  MCp.  MCP  regains  execution  control  and  begins  multicasting  again.  Unfortunately, 
as  C'OMMSetEqS CommSetK( ) )  is  not  a  stable  property  (i  e.  once  true,  forever  true)  we  must 
take  care  in  announcing  the  new  view.  We  return  to  this  issue  in  Section  6. 


4.3  The  Partial  Order 

Correctness  of  VCP  means  that  the  coordinator  successfully  sends  the  new  view  to  the  VC 
components  of  all  reachable  members  in  the  new  view.  We  will  henceforth  use  \  to  denote 
the  (local)  view  that,  is  agreed-upon  by  all  the  members  of  V. 

Since  process  histories  are  linear,  it  makes  sense  t.o  talk  about  the  xth  version  of  a  process’s 
(local)  view  -  we  denote  this  by  Viewp. 

Definition  Given  two  agreement  views  V  and  V ,  V^jV  if  and  only  if  there  is  some  p  in 
V  n  V  such  that.  V  —  Viewp,  and  V'  =  Viewp+1.  The  transitive'  closure  of  -</  is  denoted 
X.  I 

It  is  not  hard  to  see  that  the  views  defined  by  the  collection  of  vcp  components  are  partially 
ordered  by  X.  We  say  V  and  V  are  concurrent  if  and  only  if  they  are  not.  X-relatod. 

Proposition  4.1  trivially  follows  from  the  definition  of  views  (Equation  1)  and  the  increment 
rule  for  scqp. 

Proposition  4.1  Let  V  and  V  be  concurrent  views.  Then  V  D  L'  =  0. 
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5  The  Multicast  Component 


The  Multicast  Component  of  process  p,  MCP,  is  responsible  for  implementing  virtually- 
svnchronous  communication.  MCP  operates  in  two  modes.  In  one  mode  it  multicasts  messages 
to  the  members  of  its  current  view  Viewp().  In  the  other  mode,  it  flushes  outstanding  multi- 
casts  to  ensure  they  satisfy  virtually-synchronous  communication  semantics,  then  terminates 
the  current  view.  The  transition  from  multicast  mode  to  termination  mode  is  triggered  by 
any  FSP  not-coimnQ  or  comm()  message.  In  this  section,  we  define  VSC  semantics  and  the 
protocols  MCP  uses. 


5.1  Definitions 

Informally,  virtually-synchronous  communication  is  such  that,  for  any  view  V',  the  processes 
of  view  V  that  mutually  believe  each  other  alive  deliver  the  same  set  of  multicasts.5  To  make 
the  definition  of  VSC  precise  we  need  to  define  formally  the  set  of  messages  considered  to 
have  been  multicast  in  V,  as  well  as  the  subset  of  processes  that  deliver  them. 

Definition  Given  a  view  I/,  message  m  is  a  V -multicast  if  it  was  sent  by  some  p  along  a 
cut  c  such  that  Viewp(c)  =  V.  I 

Definition  (VSC)  Let  V ~<jV' .  Then  communication  in  a  system  is  virtually-synchronous 
if  and  only  if  all  processes  in  V  and  in  V'  delivered  the  same  set  of  V-multicasts.  Moreover 
no  message  is  delivered  in  more  than  one  view.  I 

It  is  important  to  notice  that  process  sequence  numbers  are  not  used  in  the  definition. 
These  are  low-level  pieces  of  information;  the  application  layer  should  only  be  concerned 
with  process  identifiers.  For  an  application-layer  process,  VSC  ensures  two  processes  that 
if  they  progress  together  from  one  view  to  another,  then  they  delivered  the  same  set  of 
messages  in  the  first  view.  As  a  result,  if  process  state  is  determined  by  an  initial  state  and 
the  set  of  multicasts  delivered  to  the  process,  VSC  means  that  if  processes  begin  executing 
in  view  V  in  the  same  state,  then  switch  together  to  view  V7,  they  will  begin  executing  in 
V'  in  the  same  state. 

5For  simplicity,  we  omit  other  forms  of  communication.  Non-multicast,  communications  do  not  introduce 
new  problems. 
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5.2  Two  Modes  of  Operation 

The  component  MCP  operates  in  two  modes: 


1.  in  normal  mode  MCP  reliably  multicasts  messages  issued  by  the  application  layer  of  p . 
and  delivers  to  the  application  layer  multicasts  it  receives  from  other  MCs: 

2.  in  view-termination  mode  MCP  does  not  multicast  new  messages;  instead  it  attempts 
to  flush  outstanding  multicasts  to  ensure  the  VSC  semantics. 

After  receiving  a  view  from  vcp,  MCP  is  in  normal  mode.  It  enters  view-termination  mode  as 
soon  as  it  receives  any  (in)accessibility  notification  from  FSP.  When  view-termination  mode 
ends,  MCp  gives  control  back  to  VCP.  MCP  is  inactive  until  it  receives  a  new  view  from  VCP. 
whereupon  MCP  begins  normal  mode  again. 


5.3  MCp  Normal  Mode 

Suppose  VCp  defines  a  view  V  =  Viewp()  and  delivers  this  to  MCP.  Recall  that  views  are  sets 
of  tuples,  which  we  call  process  signatures: 

Viewp()  =  {<r,  =  ( q,seqq )}. 

Upon  receiving  Viewp(),  MCP  enters  normal  mode,  in  which  it  multicasts  and  delivers  mes¬ 
sages.  Each  message  m  issued  by  the  application  layer  of  process  p  is  multicast  by  MCP  to 
all  q  €  V.  Before  issuing  the  message,  MCP  adds  ap  to  m.  Let  sender(m)  be  the  signature 
of  the  process  from  which  m  originated. 

When  MCP  receives  a  message  the  following  sequence  of  events  occurs: 

1.  MCp  delivers  m  (to  the  application  layer)  if  sender(m)  £  V,  and  discards  m  otherwise; 

2.  MCp  also  buffers  any  message  it  receives  and  delivers  in  V  until  it  knows  all  other 
processes  in  V  have  received  m,6  When  m  is  received  by  all  processes  in  V  we  say  it 
is  stable. 

By  delivering  only  K-multicasts,  the  normal  mode  ensures  that  no  multicast  can  be  delivered 
in  more  than  one  view  (see  the  VSC  definition). 

6There  are  many  standard  ways  of  achieving  this  -  e.g.  piggybacking  information  on  messages. 


12 


5.4  MCp  View- Termination  Mode 

Consider  a  view  V  =  Viewp().  Component  MCP  switches  from  normal  mode  to  view- 
termination  mode  after  receiving  from  FSP  either  1)  not-comm(<7)  for  q  €  Viewp().  or  2) 
comm(r)  for  r  ^  Viewp().  This  is  because  whenever  a  change  in  the  communication  topology 
is  detected  a  new  view  must  be  defined  reflecting  that  change.  However,  before  defining  a 
new  view,  MC  in  view-termination  mode  must  ensure  the  VSC  definition  is  satisfied. 

Once  MCP  enters  view-termination  mode,  it  need  only  consider  relevant  not-comm()  events 
from  FSP  to  terminate  V.  Thus,  while  executing  in  view-termination  mode,  MCP  builds  its 
own  approximation  of  NotCommSetp().  This  means  failure  notifications  have  ?  permanent 
effect  until  view-termination  mode  ends:  coming)  received  by  MCP  in  view-termination  mode 
after  not-comm(^)  (for  example  due  to  a  partition)  cannot  undo  the  not-commfg)  information. 

Just  as  a  new  view  for  p  is  defined  according  to  agreement  on  CommSet()s,  successfully 
terminating  V  involves  partitioning  V  according  to  NotCommSet()  agreement. 


Definition  The  indexical  set  Survivesp(V)  is  V  minus  the  set  of  processes  MCP  believes  failed 
in  V: 


Survivesp(V)  =  V  -  {( q,seqq )  |  NotCommp(<?)} 
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Before  we  can  explain  how  to  ensure  VSC,  we  need  the  following  data  structures. 

Definition  Consider  V  =  Viewp()  and  consistent  cut  c.  The  vector  msgp(V,c)  (of  size  |  V  |) 
is  defined  such  that: 

•  its  pth  component,  msgp(V,  c)[p],  is  the  number  of  V-multicasts  that  originated  from  p 
(up  to  c); 

•  for  q  £  V,  q  p,  its  qth  component,  msgp(V,  c)[<jr],  is  the  number  of  V-multicasts  MCP 
delivered  up  to  c  that  originated  from  q.  I 

Definition  (View  Terminated)  Consider  view  V  and  S  such  that  0  /  S  C  Ids{V)  (where 
Ids(V)  is  the  set  of  process  identifiers  appearing  in  V).  Then  VT(V, S)  holds  along  cut  c  if 
and  only  if 

A  ((ms&(Kc)  =  msgq(V,c ))  A  (Survivesp(V, c)  =  Survi ves,(Kc))^ 
p.<?eS 

It  is  not  hard  to  see  S  =  /ofs(Survivesp(Vr)).  I 


In  other  words  VT(V,S)  is  true  exactly  when  the  processes  in  S  agree  on  both  the  messages 
multicast  in  V  and  on  their  respective  Survives(V7)  sets.  For  MCP,  detecting  termination  of 
V  =  VieWp()  is  thus  reduced  to  detecting  VT(C,  S)  (for  p  €  SC  Jds(V)). 

Having  detected  VT(V'S),  whether  $  =  Ids(V)  or  S  C  Ids(V)  is  important  in  determining 
the  new  view.  In  the  first  case,  whatever  view,  V1,  VCP  later  defines,  VSC  is  satisfied  with 
respect  to  the  pair  (V,  V).  In  the  second  case  MCP  must  pass  Survivesp(V')  to  VCP;  we  will 
want  the  new  view  to  be  a  subset  of  Survivesp(V'). 

To  guarantee  that  every  non-crashed  process  in  V  eventually  detects  VT(  V7  S)  for  some  S. 
MCp  behaves  as  follows  in  view-termination  mode: 

•  it  stops  multicasting  new  messages;' 

•  it  rejects  any  message  m  such  that  sender(m)  $  Survivesp(V'). 

•  upon  receiving  not-comin(<7)  from  MCP  (for  q  €  V7),  MCP  signs  and  forwards  any  V 
multicasts  originating  from  q  that  are  still  in  p’s  buffer  (Section  5.3).  MCP  then  removes 
these  messages  from  its  buffer.  MC,  rejects  the  re-issued  message  if  NOTCOMM9(p) 
holds  (i.e.  if  MC,  has  received  not-comm(p)  from  FS,).8 

Proposition  5.1  Consider  view-termination  mode  as  described  above.  Then  for  each  p  6  V , 
there  exists  a  set,  Sp  such  that  p  €  Sp  and  VT(V,  Sp)  holds. 

Proof  (sketch)  We  introduce  the  following  notation: 

.  vt.(KS)  ®  Ap.,€s  msg,(V)  =  msg,(V ) 

•  VTjfV.S)  =  Ap,6sSurvi«sp(V)  =  Survives,  (V)) 

Consider  p  €  V .  We  build  a  sequence  S°, . . . ,  S'p, . . . ,  S” ,  where  Vi,  p  €  Sp  and  Sp  C  Ids(V), 
such  that  finally  VT(P, Sp)  holds.  Initially  take  Sp  =  Ids{V).  The  proof  ends  as  soon  as 
VT(P,  Sp)  holds,  for  some  i.  If  not,  then  VTi(V,  Sp)  or  VT2(Vr,  Sp)  does  not  hold.  We  obtain 
Sp+1  from  Sp  by  removing  a  process  (if  necessary).  Because  (1)  S°  is  finite,  (2)  the  number  of 
messages  sent  in  a  view  is  finite  once  view-terminaton  mode  is  started  (processes  do  not  issue 

7If  the  network  were  a  broadcast  domain,  MCP  could  continue  multicasting  using  a  new  signature  (p,  seqP  + 
1).  The  problem  for  less  general  environments  is  that  the  new  multicast  view  (destination  set)  is  not  yet 
known. 

8Duplicate  messages  are  recognized  and  discarded  as  usual. 
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new  multicasts  in  this  mode),  and  (3)  VT(P,  {p})  is  trivially  true,  the  construction  finally 
ends  with  Sp  such  that  VT(l/, $")  holds.  We  briefly  discuss  the  proof  reasoning  for  the  case 
when  either  VTi(V',Sp)  or  VT2(P,  SJ)  does  not  hold. 

(a)  If  VTj(V,  SJ,)  does  not  hold,  then  the  message  set  of  some  q  in  SJ  differs  from  p’s.  in  some 
component :3q,  r  €  SJ  :  (w5pp(I/)[r]  ^  ms<7,(  V'Jfr]).  If  eventually  these  sets  become  equal, 
then  take  SJ+1  =  SJ.  If  not  (i.e.,  ms<7p(V)[r]  never  equals  msp,(  R)[r]),  then  either  DOWN,, 
or  NOTCOMMr(p),  or  NOTCOMMr(<7)  holds.  So  suppose  NOTCOMMr(p)  holds  (analogous 
arguments  hold  for  NOTCOMMr(^)  and  DOWN,).  Then  eventually  NotCommp(t)  holds 
(from  FSp  Reciprocity).  The  Reissuing  rule  in  view-terminaion  mode  means  that  p  will 
forward  to  q  all  messages  it  received  from  r  that  q  did  not.  However,  since  the  message 
sets  never  agree  this  transfer  will  not  succeed  completely  before  NotComm,(p)  eventually 
holds.  Reciprocity  ensures  that  NotCoMMp(<7)  holds,  and  at  this  point  we  define  SJ+I  to  be 

S’  -{<?}• 

(b)  If  VT2(V,  SJ)  does  not  hold,  then  there  is  some  q  in  SJ  such  that  Survivesp( V)  ^ 
Survives, (R).  Without  loss  of  generality  let  r  £  Survives,(R)  —  SurvivesP(V').  Then  In¬ 
accessibility  Propagation  and  Reciprocity  mean  that  eventually  either  NoTCOMM9(r).  or 
NOTCoMMp(q)  holds.  In  the  first  case  SJ+1  to  be  SJ  —  {<7};  in  the  second  case,  take 
$‘+1  =  I 

p  p 

5.5  An  Algorithm  to  Detect  vt(V,  S) 

Like  the  vcp  algorithm  detecting  CoMMSetEqQ,  the  MCP  algorithm  detecting  VT(KSp) 
relies  on  a  coordinator  process.  MCP  determines  its  view-termination  coordinator  with  a 
deterministic  function,  mc-Coord(p),  on  the  set  Survivesp(V,  c).  We  require  that  for  p  and  q 
in  V,  with  identical  Survives(P)  sets,  mc-Coord(p )  =  mc-Coord(q). 

Let  x  —  rnc-Coord(p).  Then  x  attempts  to  detect  VT(V,  Survivesx(V')).  MCP  also  increments 
the  sequence  number  counter,  seqp,  whenever  MCP  considers  mc-Coord(p)  to  have  changed  (for 
liveness,  the  function  mc-Coord(p)  must  change  whenever  MCP  receives  not-comm(  mc-Coord(p)) 
from  FSP). 

Process  p  sends  msgp(V),  Survivesp(V),  and  seqp  to  mc-Coord(p)  when  MCP  first  considers 
mc-Coord(p)  to  be  its  coordinator,  and  whenever  msgp(V)  and  Survivesp(V7)  are  modified. 
If*  =  mc-Coord(p),  then: 

VT(V, Survives(K))  -&■  f\  ^ msgx(V )  =  msgp{V )  A  Survives*  (R)  =  SurvivesP(  V'j'j 

PeSurvivesx(V/) 
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Proposition  5.2  Consider  a  view  V,  with  p  £  V  and  the  view-termination  protocol  de¬ 
scribed  above.  Then  eventually,  either  p  crashes  or  it  detects  vt(P,  Survivesx(V/)). 


PROOF  (sketch)  The  proof  is  similar  to  that  of  Proposition  5.1.  Here,  we  consider  the 
perspective  of  x  —  rnc-Coord(p).  The  problem  is  that,  due  to  transmission  delays.  \  may 
not  detect  VT(V,  Survivesx(  V))  as  soon  as  it  holds  (transmission  of  msgp{  V)  and  Survives,, ( \ ' ) 
from  p  to  x)- 

There  are  two  cases:  eventually  x  receives  the  messages  enabling  it  to  detect  VT(x,  Survivesx(  V)  ). 
or  failures  prevent  x  from  detecting  it.  In  the  second  case,  if  both  COMMp(x)  and  COMMx(p) 
hold,  we  can  use  the  iterative  construction,  from  the  perspective  of  x,  in  the  proof  of  Propo¬ 
sition  5.1.  Otherwise  we  must  consider  the  iterative  construction  with  respect  to  y\  the 
coordinator  replacing  x  once  it  is  no  longer  a  member  of  Survivesp(  V7).  I 

Finally,  the  fact  that  v’T(Vf,  S)  is  not  stable  poses  the  same  problems  as  those  posed  by 
CommSetEqQ’s  instability.  We  consider  both  in  the  next  section. 


6  Instability  of  CommSetEqQ  and  vt(V,S) 

As  described  in  the  previous  sections,  once  VCP  learns  COMMSETEQ(CommSetp())  it  switches 
control  to  MCP;  switching  control  from  MCP  to  VCP  is  based  on  detecting  VT(Vtewp(), S). 
In  both  cases,  the  relevant  property  is  not  stable  -  it  may  become  false  after  holding 
along  some  cut.  Let  switch(vc,  V')  be  the  message  announcing  the  new  view,  V",  and 
switch(MC, SurvivesQ)  be  the  message  announcing  termination  of  view  V . 

Since  neither  CommSetEq(S)  nor  VT(V,S)  are  stable  properties,  we  can  arrive  at  the  fol¬ 
lowing  situation9: 

•  Take  p,q  €  V  such  that  p  and  q  believe  each  other  accessible,  and  let  k  be  their  mutual 
VC  coordinator  (k  —  vc-Coord(p)  =  vc-Coord(q)).  Suppose  vc*  determines  the  new 
view,  V'  (k,p,  q  €  V '),  sends  switch(vc,  V')  to  p  only,  and  then  crashes.  vcp,  upon 
receiving  switch(vc,  V'),  adopts  Viewp()  =  V'  and  switches  control  to  MCP  in  normal 
mode. 

•  Now  suppose  that  in  addition  to  VCq  not  getting  svitch(VC,  V"),  FS9  notifies  VC,  that 
k  is  inaccessible;  q  continues  executing  in  VC,  waiting  for  some  new  coordinator  s'  to 

9  While  we  illustrate  instability  with  CommSetEqO  and  the  switch  from  vcP  to  mcp,  a  similar  situation 
arises  for  vtIE  5)  as  well. 
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inform  it  of  the  new  view.  In  particular,  suppose  k'  =  p. 

•  Since  p  and  q  continue  to  believe  each  other  accessible,  FS,  gossips  not-comm(K)  to  FSP. 
At  this  point,  MCP  enters  view-termination  mode  for  view  Viewp()  =  V ,  and  q  is  still 
executing  in  VC?  waiting  to  receive  the  successor  view  to  V.  Observe  that  unless  one 
of  the  processes  crashes  or  a  network  partition  splits  them,  p  and  q  need  never  believe 
each  other  inaccessible. 

•  For  VC,  to  make  progress,  its  coordinator  vcp  must  tell  it  some  new  view.  Unfor¬ 
tunately,  VC p  cannot  begin  executing  until  MCP  leaves  view-termination  mode.  MCP 
cannot  leave  view-termination  mode  until  it  receives  Survives,()  from  MC,,  (after  all, 
q  g  V'  and  q  €  CommSetp()).  In  other  words,  p  and  q  are  deadlocked  because  their 
execution  controls  are  out  of  phase.  The  control  discrepancy  prevents  either  one  (vc? 
or  MCp)  from  making  progress  until  one  of  them  believes  the  other  inaccessible  -  q  is 
stuck  in  vc9,  and  p  is  stuck  in  MCP. 

While  processes  being  out  of  phase  is  not  always  destructive,  and  in  fact  is  quite  natural 
whenever  partitions  occur,  it  is  destructive  in  this  case  since  it  induces  deadlock.  The 
following  precludes  deadlock. 

6.1  Component-Switch  Protocol 

Let  k  be  shorthand  for  vc-Coord(p)  when  VCP  is  executing.  We  describe  the  protocol  only 
for  the  switch  from  vcp  to  MCP;  the  situation  is  analogous  for  the  reverse  switch.  Let 
V  =  Viewp().  We  define  the  following  concepts  as  depicted  in  Figure  2: 

•  From  Section  4,  each  accessibility  notification  from  FSP  forces  vcp  to  inform  its  coor¬ 
dinator  vc*  of  the  change  to  CommSetp().  Let  VC-alertp()  denote  the  message  vcp 
sends  to  VC*  to  inform  VC*  of  the  change  to  CommSetp(). 

•  Let  FS-  VC-Notifyp{V')  be  the  set  of  not-comm(9)  and  comm(r)  accessibility  notifications 
vcp  received  from  FSP  after  sending  its  first  CommSetp()  to  any  coordinator  and  before 
receiving  switchfvc,  V')  from  VC*; 

So  given  V'  and  FS-VC-Notifyp(V' ),  vcp  can  infer  which  VC-alertpQ  messages  reached  VC* 
before  it  detected  CoMMSETEQ(CommSet*())  and  which  did  not.  Let  FS-VC-Late p  be  the 
subset  of  FS-VC~Notifyp(V')  for  which  the  corresponding  VC-alertT()  message  did  not  reach 
VC*. 
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Figure  2:  FS-VC-Notifyp(v')  (lightly-shaded  rectangle),  VC-ahrtp (),  FS-VC-Latep  (darkly- 
shaded  rectangle) 
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The  Component  Switch  protocol  for  VCP  is: 

1.  The  coordinator  /c  sends  the  switch(VC,  V')  message  using  a  best  effort  reliable  multi¬ 
cast  [14]  (a  process  receiving  the  message  reissues  it  to  all  the  destination  processes). 

2.  Upon  receiving  switch(vc,  V'),  VCP: 

(a)  logically  reorders  it  to  be  before  VCP  sent  any  of  the  messages  in  FS-  VC-Latep  (t  his 
will  be  clearer  after  3)  ; 

(b)  installs  V'  as  Viewp()  and  switches  control  to  MCP,  in  normal  mode; 

3.  MCP  handles  messages  in  FS-  VC-Latep  as  if  the  corresponding  notifications  from  FSP  had 
just  arrived  (i.e.  while  MCP  is  executing,  and  not  while  VCP  was  executing).  Specifically. 
MCp  simulates  receiving  these  accessibility  notifications  in  Viewp()  =  V . 

Proposition  6.1  The  Component-Switch  Protocol  prevents  deadlock. 

PROOF  (sketch)  We  restrict  this  discussion  to  a  process  p,  in  view  V,  switching  from  its  vcp 
to  MCP  component,  and  suppose  q  €  V .  Suppose  q  never  switches  from  VC,  to  MCP  in  view 
V.  We  show  this  does  not  prevent  p  from  later  switching  from  MCP  back  to  vcp. 

Because  p  switches  to  MCP  in  view  V,  p  has  received  the  switch(vc,  V)  message.  By  the 
Component-Switch  Protocol,  p  has  reissued  switch(vc,  V)  to  q.  Then  either: 

1.  q  never  receives  switch(VC,  V ),  or 

2.  q  receives  switch(vc,  V)  after  having  already  switched  to  MC,  in  view  V with  V'  ^  V. 

In  the  first  case,  NoTCOMM,(p)  holds  eventually.  In  the  second,  p  €  V'  contradicts  p  € 
V.  Thus,  NOTCOMM,(p)  holds,  and  FSP  Reciprocity  means  eventually  either  p  crashes  or 
NotCommp(<3-)  holds.  Once  NotCommp(?)  holds,  p’s  progress  (i.e.  switching  back  to  vcp) 
is  decoupled  from  q' s  progress;  q  cannot  be  responsible  for  blocking  p.  I 


7  Concluding  Remarks 

This  paper  has  shown  how  to  implement  virtually-synchronous  communication  using  a  three- 
component  architecture  for  systems  that  experiences  process  crash  failures  and  network  par¬ 
titions.  The  three-component  architecture  lead  us  to  define  a  clear  semantics  for  a  Failure 
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Suspector  (a  necessary  part  of  any  live,  asynchronous  system)  that  guarantees  iiveness  of  the 
VC  and  MC  components.  Clearly  defining  these  semantics  allows  one  to  implement  the  Fail 
ure  Suspector  as  a  modular  tool  -  distinct  from  all  other  components  -  whose  implementation 
can  take  advantage  of  the  characteristics  of  the  underlying  network. 

Considering  a  membership  service  in  relation  to  virtually-synchronous  communication  also 
lead  us  to  better  understand  the  need  for  a  strong-partial  compared  to  a  weak-partial  member¬ 
ship  service.  Specifically,  a  strong-partial  membership  service  (non-intersecting  concurrent 
views)  is  naturally  related  to  virtually-synchronous  communication.  We  can  understand  this 
in  the  following  way.  The  MC  component  must  identify  the  sender  of  a  message  by  its  signa¬ 
ture  aq  to  ensure  that  no  multicast  is  delivered  in  more  than  one  view.  This  led  us  to  define 
a  view  as  a  set  of  process  signatures.  Considering  the  increment  conditions  of  seqv,  two  dif¬ 
ferent  views  V  and  V'  trivially  have  a  non-empty  intersection.  In  other  words,  by  requiring 
that  no  multicast  be  delivered  in  more  than  one  view,  we  were  led  to  the  partial-strong  mem¬ 
bership  service.  However  if  we  remove  the  MC  component,  (i.e.  if  the  membership  service 
is  only  defined  by  FS  and  VC,  without  any  reference  to  communication),  then  the  sequence 
number  seqp  has  no  clear  justification.  In  that  case,  a  view  is  just  a  set  of  process  identifiers 
(or  a  set  of  identifiers  and  an  incarnation  number).  With  this  definition,  the  same  VC  proto¬ 
col  we  described  would  define  concurrent  views  that  overlap,  providing  only  a  weak-partial 
membership  service. 
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